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WO 00/58507 PCT/GB00/01222 

POLYNUCLEOTIDE SEQUENCING 

Field of the Invention 

This invention relates to the sequencing of polynucleotides. In particular, this 
invention discloses methods for determining the sequence of arrayed polynucleotides. 
5 Background to the Invention 

Advances in the study of molecules have been led, in part by improvement in 
technologies used to characterise the molecules or their biological reactions. In 
particular, the study of the nucleic adds DNA and RNA has benefited from developing 
technologies used for sequence analysis and the study of hybridisation events. 
10 An example of the technologies that have improved the study of nucleic adds, 

is the development of fabricated arrays of immobilised nudeic adds. These arrays 
consist typically of a high-density matrix of polynudeotides immobilised onto a solid 
support material. Fodor et a/. Trends in Biotechnology (1994) 12:19-26, describes 
ways of assembling the nudeic adds using a chemically sensitized glass surface 
15 protected by a mask, but exposed at defined areas to allow attachment of suitably 
modified nudeotide phosphoramidites. Fabricated arrays may also be manufactured 
by the technique of "spotting" known polynudeotides onto a solid support 3( 
predetermined positions (e.g. Stimpson et al PNAS (1995) 92:6379-6383). 

A further development in arcay technology is the attachment of the 
20 polynudeotides to the solid support material via beads (microspheres). 

For DNA arrays to be useful their sequences must be determined. US 
5302509 disdoses a method to sequence polynucleotides immobilised on a solid 
support The method relies on the incorporation of 3-blocked bases A, G, C and T 
having a different fluorescent label to the immobilised polynudeotide, in the presence 
25 of DNA polymerase. The polymerase incorporates a base complementary to the 
target polynucleotide, but is prevented from further addition by the 3'-blocking group. 
The label of the incorporated base can then be determined and the blocking group 
removed to allow further polymerisation to occur. 

However, the need to remove the blocking groups after each cyde is time- 
3 0 consuming and must be performed with high effidency. 

Similarly, EP0640146 disdoses a polymerisation-based technique for 
sequencing DNA. The technique again requires removal of a blocking group prior to 
subsequent incorporation of nudeotides. 
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There is therefore a need foraltemative methods for determining the sequence 
of arrayed polynucleotides. 
Summary of the Invention 

In the general method of the invention, a target polynucleotide sequence can 
5 be determined by generating its complement using the polymerase reaction by the 
extension of a suitable primer, and characterising the successive incorporation of 
bases that generate the complement The method requires the target sequence to be 
immobilised on a solid support, with multiple copies of the target being localised within 
discrete regions. Each of the different bases A, T, G or C are then brought by 
10 sequential addition, into contact with the target, and any incorporation events 
detected. Repeating the procedure with each of the bases allows the sequence of the 
complement to be identified, and thereby the target sequence also. 

A distinguishing feature from the disclosure in US 5302509 is that the bases 
do not contain a blocking group preventing further polymerisation from occurring. In 
15 addition, the present invention requires the separate and serial addition of each of the 
different base types to the array, and, when fluorophores are used as the label, 
removal of the label can be carried out efficiently by photobleaching. 

A further distinguishing feature, particularly relevant to EP 0640146, Is that for 
each incorporation step, only a minor proportion of the bases are detectably-labelled. 
2 0 Consequently, among the many copies of the target, relatively few will incorporate a 
labelled base into the complement This permits the straight forward identification of 
any sequence containing two or more consecutive bases of the same type. In this 
case, copies of the target will incorporate differing amounts of the labelled base into 
the complement, resulting in differing levels of signal. It is then possible to determine 
25 quantitatively the number of consecutive bases on the complement by detecting the 
different level of signals generated, as explained later. 

Accordingly, a method for determining the sequence of a target polynucleotide 
on an array, comprises the steps of: 

(i) forming an array comprising multiple copies of each target 
30 polynucleotide; 

(ii) contacting the array with a composition comprising one of the bases 
A, T, G or C under conditions that permit polymerisation to occur, 
wherein a minor proportion of the bases are detectably-labelled; 
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(Hi) detecting the incorporation of a base onto the complement of the 

target after removal of non-incorporated bases; and 
(iv) repeating steps (ii) and (iii) with each of the different bases until the 
sequence is determined. 
5 According to one aspect of the invention, the label from incorporated bases 

may be removed either prior to the addition of bases having the same label or before 
it becomes difficult to detect incorporation. 

According to a second aspect of the invention, when the label is a fluorophore, 
the fluorescence signal generated on nucleotide incorporation may be measured 
10 quantitatively, without the need to remove labels after each incorporation step. There 
is therefore a method for determining the sequence of a target polynucleotide as 
described above, wherein the fluorescence labels are not removed from the 
incorporated nucleotides, and subsequent detection of incorporation is carried out by 
measuring the step wise increase in the fluorescence signal. 
15 The advantage of this embodiment is that it does not require the step of 

photobleaching and may therefore be carried out quickly and efficiently. 

Sequencing the polynucleotides on the array makes it possible to form a 
spatially addressable array. This may then be used for many different applications, 
including genotyping studies and other characterisation experiments. 
20 The method of the present invention may be automated ^produce a very 

efficient and fast sequence determination. 
Description of the Drawings 

Figure 1 represents a fluorescence (left) or optical (right) image generated in 
the presence (A) and absence (B) of polymerase enzyme; and 
25 Figure 2 represents a fluorescence image generated from beads with 

fluorophore-labelled DNA attached (A) or a fluorophore-labelled nucleotide 
incorporated into DNA using a polymerase (B). 
Description of the Invention 

The method for determining the sequence of the arrayed polynucleotides is 
30 carried out by contacting the array separately with the different bases to form the 
complement to that of the target polynucleotide, and detecting incorporation. The 
method makes use of polymerisation, whereby a polymerase enzyme extends the 
complementary strand by incorporating the correct base complementary to that on the 
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target. The polymerisation reaction also requires a specific primer to initiate 
polymerisation. 

For each cycle, adding one base type to the array, only a minor proportion of 
the bases are detectably-labelled, i.e. less than 50% of the bases are detectably- 

5 labelled, preferably less than 20%. Therefore, it is only the incorporation of 
detectably-labelled bases that can be monitored. The labelled bases are present at 
a fixed low concentration with respect to the non-labelled bases. The concentration 
may be chosen to permit a suitable incorporation rate of the labelled bases for efficient 
detection. For example the concentration may be chosen to permit between 10% to 

o 0.0001 % incorporation of labelled bases, preferably, between 5% and 0.01 %. most 
preferably between 1% and 0.1%. 

Using many copies of the same polynucleotide in discrete regions it is possible 
to detect quantitatively the incorporation of a labelled base. For example, on 
incorporation of the adenosine nucleotide, a proportion of the polynucleotides will have 

L5 a non-labelled adenosine nucleotide and a proportion will have a labelled adenosine 
nucleotide. Detecting the incorporation of the label will allow a sequence 
determination to be made. If two adenosine nucleotides are incorporated 
consecutively into the complementary strand, a proportion of the polynucleotide copies 
will incorporate two non-labelled adenosine nucleotides, a proportion will incorporate 

20 one labelled adenosine and one non-labelled adenosine, and a proportion will 
incorporate two labelled adenosine nucleotides. However, the ratio of labelled to 
unlabeled nucleotide will be such that very little of the labelled nucleotide wilt 
incorporate into the same strand. This is especially preferable when fluorescent labels 
are used, where fluorescence quenching or loss of linearity of signal may be caused. 

2 5 The label will therefore be distributed throughout the population of a given sequence. 
Consequently, there will be a quantitative difference in the signal generated within the 
population of the given sequence. It is possible therefore to detect the incorporation 
of the two consecutive labelled bases due to the quantitative differences in the signal. 
In the context of the invention, reference to the bases A, T. G and C is taken 

30 to be a reference to the deoxynucleoside triphosphates. Adenosine. Thymidine, 
Guanosine and Cyfidine. and to functional analogs thereof, including 
dideoxynucleoside triphosphates. 

The terms "arrayed polynucleotides" and "polynucleotide arrays" are used 
herein to define an array of polynucleotides that are immobilised on a solid support 
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material. The polynucleotides may be immobilised to the solid support indirectly 
through a linker molecule, or may be attached to a particle, e.g. a microsphere, which 
is itself attached to a solid support material. 

An important requirement is that there are multiple copies of each target 
5 polynucleotide on the array. Typically, these will be in discrete positioned regions on 
the solid support. Each discrete region may typically comprise several hundred to 
several thousand copies of the target polynucleotide. There may be, for example, up 
to 10,000 polynucleotide copies per region. The polynucleotides within each region 
preferably form a substantially uniform arrangement This permits a high level of 
10 discrimination between individual polynucleotides, which may be preferable to resolve 
individual labels. However, it is not necessarily the density of the polynucleotides that 
is of primary importance; the concentration of the labelled bases during the 
sequencing steps is also important, and this can be optimised readily by the skilled 
person. 

15 The term "spatially addressable" is used herein to describe how different 

molecules may be identified on the basis of their position on an array. 

The detection of an incorporated base may be carried out by using a confoca! 
scanning microscope to scan the surface of the array with a laser, to image a 
fluorophore bound directly to the incorporated base. Alternatively, a sensitive 2-D 

20 detector, such as a charge-coupled detector (CCD), can be used to visualise the 
individual signals generated. The use of such apparatus is known to the skilled 
person. However, other techniques such as scanning near-field optical microscopy 
(SNOM) are available which are capable of smaller optical resolution, thereby 
committing "more dense" arrays to be used. For example, using SNOM, individual 

2 5 polynucleotides may be distinguished when separated by a distance of less than 1 00 
nm, e.g. 10 nm x 10 nm. For a description of scanning near-field optical microscopy, 
see Moyer et al Laser Focus World (1993) 29:10. 

The polynucleotides that may be sequenced include DNA, RNA and synthetic 
alternatives such as PNA. 

30 The polynucleotides may be attached to the solid support by recognised 

means, including the use of biotin-avidin interactions or the use of amine linkages, (n 
one embodiment, the polynucleotides are attached to the solid support via microscopic 
beads (microspheres), which may in turn be attached to the solid support by known 
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means. The microspheres may be of any suitable size, typically in the range of from 
10 nm to 100 nm in diameter. 

Attachment via microspheres is a preferred embodiment as it allows discrete 
regions of polynucleotides to be easily generated on the array. Each microsphere 
5 may have multiple copies of a polynucleotide attached, and each microsphere can be 
resolved individually to determine incorporation events. 

The method makes use of the polymerisation reaction to generate the 
complementary sequence of the target The conditions necessary for polymerisation 
to occur will be apparent to the skilled person. For example, a polymerase enzyme 

10 may be used to extend the complementary strand, and different polymerases, 
including DNA polymerases and RNA polymerases, are known to those skilled in the 
art For example, the Klenow fragment of E. coli DNA polymerase I or the T7 DNA 
polymerase may be used. To carry out the polymerase reaction it may be necessary 
to first anneal a primer sequence to the target polynucleotide, the primer sequence 

15 being recognised by the polymerase enzyme and acting as an initiation site for the 
subsequent extension of the complementary strand. Other conditions necessary for 
carrying out the reaction, including temperature and pH, will be apparent to those 
skilled in the art 

This polymerisation step is allowed to proceed for a time sufficient to allow 
20 incorporation of all the correct bases. This will depend on the efficiency of 
incorporation and can be determined by the skilled person. Bases that are not 
incorporated are then removed, for example, by subjecting the array to a washing 
step, and detection of the incorporated labels may then be carried out 

Detection may be by conventional means, for example if the label is a 

2 5 fluorescent moiety, detection may be carried out by optical microscopy, e.g. confocal 

scanning microscopy. 

A preferred embodiment of the invention uses fiuorophores as the label, and 
many examples of fiuorophores that may be used are known in the prior art e.g. 
tetramethytrhodamine (TMR). 

3 o After detection, the labels may be removed from the bases so that they do not 

interfere with the signal generated from next cycle of incorporation. If the label is a 
fluorophore it is possible to bleach the fluorophore by chemical means or through the 
use of a laser (photobleaching). Alternatively, the label may be removed by chemical 
or photochemical means. 
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The process of incorporating bases may then be repeated using each of the 
different bases until the sequence has been determined. 

It may not always be necessary to remove the labels prior to the addition of the 
next base sample. Different bases may have distinguishable labels and so It will only 
5 be necessary to remove incorporated labels prior to adding bases having an identical 
label. 

In one embodiment, fluorescent labels are used and detection is earned out by 
optical means without the requirement for removing labels between incorporation 
steps. 

10 F 0r example, a confocal microscope may be used to scan the array and 

measure quantitatively the step-wise increase in fluorescence after each cycle of 
incorporation. By measuring the increase in the amount of fluoresence after each 
cycle, and not the absolute amount, it should be possible to determine whether there 
are two or more nucleotides incorporated consecutively onto the template. This 

15 method relies on using sensitive detectors (e.g. charge coupled detectors) to measure 
the increase in signal. Suitable apparatus for carrying out the method is available 
commercially and will be apparent to the skilled person. 

In a separate embodiment of the invention, the labelled bases may be modified 
so that on incorporation, no further bases may be added. Bases that carry out this 

2 o chain terminating function include the dideoxynucieoside triphosphates, as used in 
conventional Sanger sequencing (Proc. Natl. Acad. Sci. USA 74: 5463-5467, 1977). 

Therefore, after each incorporation step, a proportion of the polynucleotides 
will incorporate a labelled base that prevents further chain-extension. The number of 
polynucleotides available for the polymerisation step will gradually decrease as the 

2 5 sequencing method proceeds. However, provided there are sufficient copies of the 
polynucleotide, and provided the concentration of the labelled, chain-terminating 
bases, is sufficiently low. it should be possible to sequence the target polynucleotides. 

The following experiment illustrates the invention. 
Example 

30 In this experiment a fluorescently-labelled DNA molecule (SEQ ID NO. 1) was 

coupled directly to beads and the level of fluorescence measured using an inverted 
Nikon microscope with an ICCD detector in an epifluorescence set-up. In a separate 
reaction, an unlabeled DNA (SEQ ID NO. 2) was attached to beads (containing SEQ 
ID NO. 2) and a fluorescently-labelled nucleotide incorporated onto the DNA (SEQ ID 
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20 
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NO. 2) using a polymerase. By comparing the average level of fluorescence between 
the two sets of beads the efficiency of incorporation of the fluorescentiy-labelled 
nucleotide was shown to be 89%. This was determined by diluting the fluorescent 
beads in unmodified beads so that each fluorescent bead could be detected 
individually. 

By measuring the signal-to-noise in the experiment an estimate can be 
obtained of the fraction of nucleotides that can be labelled with a fluorophore and 
detected when incorporated. This Is less than 1%, i.e. it is possible to detect 
incorporation when the concentration of fluorescentiy-labelled nucleotides Is such that 
only 1% is incorporated, and the remaining 99% of the incorporated nucleotides are 

non-labelled. 

The experiment is now described in more detail. 

DNA Coupling 

Carboxylic acid-modified beads (both non-porous polystyrene and silica) of 
sizes 0.5-2.9 pm were placed in solutions of milii-q water (typically 1 mg per 50 mi). 
1-3(3-Dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (EDC) (1 mg) and the 
oligonucleotide (added to give a final concentration of 10 pm) were added, the beads 
agitated by vortexing and left for 12 h at room temperature. The beads were washed 
with 0.15 M NaOH. twice with TT buffer (250 mM Tris.HCI. pH 8.0, 0.1 % tween 20} 
and heated at 80'C in TTE (250 mM Tris.CHI, pH 8.0, 0.1% tween 20, 20 mM sodium 
EDTA) and rinsed with water. To achieve a dilute array, the beads were sonicated in 
200 pi water and 2.5 pi evaporated onto a heated slide. 
Enzyme Incorporation 

A solution of the 51mer (SEQ ID NO. 3) (4 pm; 2eqvs) in hybridisation buffer 
(5 mM. MgCI 2 , 7.5 mM DTT, 10 mM Tris.HCI (pH 7.6), 0.005% Triton X100) (20 ml) 
was added to 0.05 mg of beads (containing SEQ ID NO. 2) which were heated to 
90-C for 2 min and allowed to cool for 1h. The fluorescent dUTP (400 pm stock, 0.5 
Ml, 10 pm; 4eqvs)) was added. A fraction of the beads were removed as a washing 
control and the polymerase (Sequence) (0.5 pi. 6.5 units) (one unit will incorporate 1 
nmole dNTP in 30 s a 37°C) was added. The reaction was left at room temperature 
for 4 h and the beads were washed with NaOH, TT and TTE buffers as above and 
arrayed onto a coverslip. 



The oligos used in this study are as follows: 
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3^C{TAMRA)AGCGTCGGCAGGTATCCCAA-(C6amino)-5 , SEQ ID NO. 1 
and unlabelled: 

5 

5^aminc^GTCATCGAACGTCGAGCCTCGCAGCCGTCCAACCAACTCA-3 , 

SEQ ID NO. 2 

10 and 

S'^^GTAGCTTGCAGCTCGGAGCGTCGGCAGGTTGGTTGAGTAGGTCTTGTTT-S* 

SEQ ID NO. 3 

15 

as hybridised template. 

Figure 1 shows the fluorescence image on the left and the optical image on the 
right when the experiment on the incorporation of fiuorescently-labelled d-UTP was 
performed in the presence (A) and absence (B) of the polymerase. It is dear that no 

20 fluorescence is detected in the absence of any enzyme. 

Figure 2 shows the beads diluted in unmodified beads so a quantitative 
analysis can be performed. The top figures (A) show the fluorescence from the beads 
with fluorophore-labelled DNA attached to the bead and the lower image (B) shows 
the level of fluorescence when the fluorophore-labelled nucleotide is incorporated into 

25 the DNA using a polymerase. The values of the fluorescence from the beads were 
compared: 

(A) 3-TAMRA DNA; Average counts/bead = 1956 (54 beads, +/- 50%) 

3 o (B) unblocked carboxylic acid-modified beads; counts/bead = 1739 (89%) 
(88 beads, = +/- 40%) 

This means the incorporation of the labelled nucleotide is 89%. 

By comparing the signal-to-noise in Figure 1 between the level of fluorescence 
3 5 when the enzyme is present and when it is absent it is possible to estimate that the 
level of fluorescence could be reduced by a factor of 100-1000 while still allowing the 
detection of fluorescence above the background with adequate signal-to-noise. This 
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means that experiments can be performed with the fluorophore-labelled nucleotides 
highly diluted in non-labelled nucleotides so that only 1% of the fluorophore-labelled 
nucleotides are incorporated. 
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CLAIMS : 

1 . A method for determining the sequence of a target polynucleotide, comprising 
the steps of: 

(j) contacting an array comprising multiple copies of the target 
5 polynucleotide with one of the bases A, T, G or C in a form and under conditions that 
permit polymerisation to occur, wherein a minor proportion of the base is detectably- 
labelled; 

(ii) detecting the incorporation of the base onto the complement of the 
target after removal of non-incorporated base; and 
10 (jjj) repeating steps (i) and 00 each of ^ e different bases until the 

sequence is determined. 

2. A method according to claim 1, wherein the label is a fluorescent moiety. 

3. A method according to claim 2, wherein step (ii) is carried out using optical 
means. 

15 4. A method according to claim 3, wherein the optical means is a confocal 
scanning microscope. 

5. A method according to any of claims 2 to 4, wherein step (ii) is earned out by 
measuring quantitatively the fluorescence signal generated on nucleotide 
incorporation. 

20 6. A method according to any preceding claim, wherein the label is not removed 
from the incorporated nucleotides, and subsequent detection of incorporation is 
carried out by measuring the stepwise increase in the signal. 

7. A method according to any of claims 1 to 5, wherein the label from 
incorporated bases is removed either prior to the addition of base having the same 

25 label or before it becomes difficult to detect incorporation. 

8. A method according to claim 7, wherein the label is removed by 
photobleaching. 

9. A method according to any preceding claim, wherein the proportion is less than 
10%. 

30 10. A method according to claim 9, wherein the proportion is less than 1%. 

11. A method according to any preceding claim, wherein the polynucleotide is 
attached to the array via microspheres. 

12. A method according to any preceding claim, wherein the detectably-labelled 
bases are dideoxynucieoside triphosphates. 
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